首页> 外文OA文献 >Joint Separation and Denoising of Noisy Multi-talker Speech using Recurrent Neural Networks and Permutation Invariant Training
【2h】

Joint Separation and Denoising of Noisy Multi-talker Speech using Recurrent Neural Networks and Permutation Invariant Training

机译:基于maTLaB的嘈杂多语音语音联合分离与去噪   循环神经网络和置换不变训练

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this paper we propose to use utterance-level Permutation InvariantTraining (uPIT) for speaker independent multi-talker speech separation anddenoising, simultaneously. Specifically, we train deep bi-directional LongShort-Term Memory (LSTM) Recurrent Neural Networks (RNNs) using uPIT, forsingle-channel speaker independent multi-talker speech separation in multiplenoisy conditions, including both synthetic and real-life noise signals. Wefocus our experiments on generalizability and noise robustness of models thatrely on various types of a priori knowledge e.g. in terms of noise type andnumber of simultaneous speakers. We show that deep bi-directional LSTM RNNstrained using uPIT in noisy environments can improve the Signal-to-DistortionRatio (SDR) as well as the Extended Short-Time Objective Intelligibility(ESTOI) measure, on the speaker independent multi-talker speech separation anddenoising task, for various noise types and Signal-to-Noise Ratios (SNRs).Specifically, we first show that LSTM RNNs can achieve large SDR and ESTOIimprovements, when evaluated using known noise types, and that a single modelis capable of handling multiple noise types with only a slight decrease inperformance. Furthermore, we show that a single LSTM RNN can handle bothtwo-speaker and three-speaker noisy mixtures, without a priori knowledge aboutthe exact number of speakers. Finally, we show that LSTM RNNs trained usinguPIT generalize well to noise types not seen during training.
机译:在本文中,我们建议同时使用发声级置换不变训练(uPIT)进行与说话者无关的多说话者语音分离和去噪。具体来说,我们使用uPIT训练深度双向双向长期记忆(LSTM)递归神经网络(RNN),在多噪声条件下实现单通道说话者独立的多说话者语音分离,包括合成噪声信号和现实噪声信号。我们将实验的重点放在模型的可推广性和噪声鲁棒性上,这些模型依赖于各种先验知识,例如就噪声类型和同时讲话者的数量而言。我们展示了在嘈杂的环境中使用uPIT训练的深度双向LSTM RNN可以改善说话者独立的多说话者语音分离和去噪的信噪比(SDR)以及扩展的短时客观清晰度(ESTOI)措施具体来说,我们首先证明,在使用已知噪声类型进行评估时,LSTM RNN可以实现较大的SDR和ESTOI改进,并且单个模型能够处理多种噪声类型而性能仅略有下降。此外,我们证明,单个LSTM RNN可以处理两个讲话者和三个讲话者的嘈杂混合物,而无需事先知道确切的讲话者数量。最后,我们证明了使用uPIT训练的LSTM RNN可以很好地推广到训练期间未看到的噪声类型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号